-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option -g: kill all processes within a process group #247
Conversation
cooool |
will this work to kill slack cleanly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Two comments in the review, otherwise good.
MANPAGE.md
Outdated
@@ -104,6 +104,9 @@ When earlyoom is run through its default systemd service, the `-p` switch doesn' | |||
#### -n | |||
Enable notifications via d-bus. | |||
|
|||
#### -g | |||
Kill all processes that are in the same process group as the one with excessive memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give a short description when a user should enable this flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Added a more detailed description.
kill.c
Outdated
@@ -73,6 +73,9 @@ int kill_wait(const poll_loop_args_t* args, pid_t pid, int sig) | |||
} | |||
meminfo_t m = { 0 }; | |||
const unsigned poll_ms = 100; | |||
if (args->kill_process_group) { | |||
pid = -getpgid(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling, please! getpgid() can fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'm not sure how the error code in fatal()
is chosen so for now I just randomly picked 7.
- Error handling of getpgid() - Describe when this flag should be enabled
kill.c
Outdated
@@ -73,6 +73,12 @@ int kill_wait(const poll_loop_args_t* args, pid_t pid, int sig) | |||
} | |||
meminfo_t m = { 0 }; | |||
const unsigned poll_ms = 100; | |||
if (args->kill_process_group) { | |||
if ((pid = getpgid(pid)) < 0) { | |||
fatal(7, "%s: could not get PGID: %s", __func__, strerror(errno)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fatal error is too much here ;)
getpgid can fail when the process has already exited, this is normal. Please just "return res" as below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds more reasonable indeed. Thanks!
Code is good now, thanks! But I still need to understand when this should be used. Does firefox create a process group for itself? I guess you use it - for what? |
I figured out that you can use Problem: gnome-shell and chrome share the same PGID |
I now also started Atom and Firefox: https://gist.github.com/rfjakob/c5f463d0fe256fa3571f384c27074a85 |
Thanks for the review! We maintain several GPU servers (with NVIDIA GPUs to be precise) with EarlyOOM installed, which primarily run deep-learning jobs like PyTorch or Tensorflow applications. CC @oToToT and @WillyPillow who are also in charge of this issue. |
This seems to be the same on my Ubuntu 20.04 VM, unfortunately. |
i'm using kde and not having this. things looks separated as i expect. but i'm curious as i launch multiple firefox profiles from the same shell and they end up in their own separate process groups. why is this not happening for you? |
To elaborate on this, we observed earlyoom killing parent processes in programs utilizing GPUs, leaving the (now orphaned) child processes holding on to the allocated VRAM. While killing the whole process group is rather blunt, as can be seen from your examples, it should be fine in our use case. This is because i) we're essentially using it to take care of misbehaving users/processes, so even killing all processes by that user is acceptable, and ii) killing the process group seems to usually kill the respective login session, which is fairly reasonable from a shell account server PoV. |
Another data point: on awesomeWM, the PGIDs seem properly separated. This is probably pretty dependent on how the DE/launcher spawns applications. |
I added some warnings in |
Merged as 8f4b654 , thanks! |
Thanks! |
Add an option that allows EarlyOOM to kill all processes that are in the same process group as the one with excessive memory usage.